Classification Metrics

Review

Review

Modeling steps:

  1. Clean the data
    • What do we do about NA’s?
    • Convert categorical variables to factors
  1. Establish a model
    • Or many models to try?
    • Do we need to tune any hyperparameters?
  1. Establish a recipe
    • Or many recipes to try?
    • How will we transform our variables?
    • Categorical to dummy variables? (but not the response!)
    • (data = full dataset)
  1. Make workflows and put them in workflowsets
  1. Send the workflows to cross validation for model selection
    • For comparing different models
    • For tuning
    • For comparing different recipes
  1. Send your final model to cross-validation for final metrics
    • why do we cross-validate for final metrics?
  1. Fit the final model on the full dataset - this is your finished product!

Setup

ins <- read_csv("https://www.dropbox.com/s/bocjjyo1ehr5auz/insurance.csv?dl=1")

ins <- ins %>%
  mutate(
    smoker = factor(smoker)
  ) %>%
  drop_na()

knn_mod <- nearest_neighbor(neighbors = 5) %>%
  set_engine("kknn") %>%
  set_mode("classification")

knn_recipe <- recipe(smoker ~ age + bmi + charges, 
                     data = ins)

knn_wflow <- workflow() %>%
  add_recipe(knn_recipe) %>%
  add_model(knn_mod)

cvs <- vfold_cv(ins, v = 5)

knn_fit <- knn_wflow %>%
  fit_resamples(cvs)

Metric 1: Accuracy

Accuracy

What percent of our guesses were correct?

knn_fit %>% collect_metrics()
# A tibble: 3 × 6
  .metric     .estimator   mean     n std_err .config             
  <chr>       <chr>       <dbl> <int>   <dbl> <chr>               
1 accuracy    binary     0.963      5 0.00852 Preprocessor1_Model1
2 brier_class binary     0.0252     5 0.00455 Preprocessor1_Model1
3 roc_auc     binary     0.993      5 0.00213 Preprocessor1_Model1

Accuracy

The problem: Consider this data.

  [1] "B" "B" "B" "B" "B" "B" "B" "B" "B" "B" "B" "B" "B" "B" "B" "B" "B" "B"
 [19] "B" "B" "B" "B" "B" "A" "B" "B" "B" "B" "B" "B" "B" "B" "B" "B" "B" "B"
 [37] "B" "B" "B" "B" "B" "B" "B" "B" "B" "B" "B" "B" "B" "B" "B" "B" "B" "B"
 [55] "A" "B" "B" "B" "B" "B" "B" "B" "B" "B" "B" "B" "B" "B" "B" "B" "B" "B"
 [73] "B" "B" "B" "B" "B" "B" "B" "B" "B" "B" "B" "B" "B" "B" "B" "B" "B" "B"
 [91] "B" "B" "B" "B" "B" "B" "B" "B" "B" "B"

If I guess “B” every time, I’ll have 98% accuracy!

Metric 2: ROC-AUC

ROC

ROC = “reciever operating charateristic** (ew)

False Positive Rate = (how many A’s did we say were B)/(how many did we say were “B” total)

How many did we misclassify as B?

True Positive Rate = (how many B’s did we say were B)/(how many B’s are there total)

How many true B’s did we miss?

ROC

ROC = plots TPR and FPR across many decision boundaries

TPR and FPR

First, find the probability that the model assigns each observation for the target category of your categorical variable. (You get to decide which is the target)

knn_final_fit <- knn_wflow %>%
  fit(ins)

ins <- ins %>%
  mutate(
    prob_smoker = predict(knn_final_fit, ins, type = "prob")$.pred_yes
  )

TPR and FPR

If we choose a cutoff of 0.5, what is our TPR and FPR?

ins |>
  mutate(
    predict_smoker = prob_smoker > 0.5
  ) |>
  count(smoker, predict_smoker)
# A tibble: 4 × 3
  smoker predict_smoker     n
  <fct>  <lgl>          <int>
1 no     FALSE            340
2 no     TRUE               4
3 yes    FALSE              2
4 yes    TRUE              85

TPR and FPR

If we choose a cutoff of 0.8, what is our TPR and FPR?

ins |>
  mutate(
    predict_smoker = prob_smoker > 0.8
  ) |>
  count(smoker, predict_smoker)
# A tibble: 3 × 3
  smoker predict_smoker     n
  <fct>  <lgl>          <int>
1 no     FALSE            344
2 yes    FALSE              8
3 yes    TRUE              79

TPR and FPR

If we choose a cutoff of 0.2, what is our TPR and FPR?

ins |>
  mutate(
    predict_smoker = prob_smoker > 0.2
  ) |>
  count(smoker, predict_smoker)
# A tibble: 3 × 3
  smoker predict_smoker     n
  <fct>  <lgl>          <int>
1 no     FALSE            330
2 no     TRUE              14
3 yes    TRUE              87

ROC

ins |>
  mutate(
    smoker = factor(smoker, levels = c("yes", "no"))
  ) |>
  roc_curve(truth = smoker, prob_smoker) %>%
  autoplot()

ROC

GOOD: The ROC curve is way above the line (we can achieve a really good TP rate without sacrificing FP rate)

BAD: The ROC curve is on the line (FP/TP is a direct trade-off)

ROC-AUC

ROC-AUC is the area under the curve - large values are good!

  • 1 = I always predict perfectly, no matter what the cutoff is. All my predicted probs are 0% or 100%.

  • 0.5 = I predict just as well as random guessing. If guess more “yes”, I get more of the “no” wrong.

  • Below 0.5 = Yikes.

Other ways of talking about TP and FP

Sensitivity and Specificity

  • Sensitivity = how much of the target category do we correctly identify?

= (correctly guessed Category A)/(all actual Category A’s)

= TP/(TP + FN)

  • Specificity = how much of the non-target category do we correctly identify?

= (correctly guessed Category B)/(all actual Category B’s)

= TN/(TN + FP)

Precision and Recall

  • Recall = how much of the target category do we correctly identify?

= (correctly guessed Category A)/(all actual Category A’s)

= TP/(TP + FN)

= Sensitivity!

  • Precision = when we guess the target category, are we correct?

= (correctly guessed Category A)/(all guessed Category A)

= TP/(TP + FP)

Try it!

Open Activity-Classification-2.qmd again

Go to https://yardstick.tidymodels.org/articles/metric-types.html

Scroll down to the list of metrics

As a group, research one of the metrics that we haven’t discussed in class, and compute it for some of your models.